Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
نویسندگان
چکیده
Unifying text detection and recognition in an end-to-end training fashion has become a new trend for reading the wild, as these two tasks are highly relevant complementary. In this paper, we investigate problem of scene spotting, which aims at simultaneous natural images. An trainable neural network named Mask TextSpotter is presented. Different from previous spotters that follow pipeline consisting proposal generation sequence-to-sequence network, enjoys simple smooth learning procedure, both can be achieved directly two-dimensional space via semantic segmentation. Further, spatial attention module proposed to enhance performance universality. Benefiting representation on recognition, it easily handles instances irregular shapes, instance, curved text. We evaluate four English datasets one multi-language dataset, achieving consistently superior over state-of-the-art methods tasks. Moreover, further our method separately, significantly outperforms regular recognition.
منابع مشابه
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
We present a novel end-to-end trainable neural network model for task-oriented dialog systems. The model is able to track dialog state, issue API calls to knowledge base (KB), and incorporate structured KB query results into system responses to successfully complete task-oriented dialogs. The proposed model produces well-structured system responses by jointly learning belief tracking and KB res...
متن کاملFast End-to-End Trainable Guided Filter
Image processing and pixel-wise dense prediction have been advanced by harnessing the capabilities of deep learning. One central issue of deep learning is the limited capacity to handle joint upsampling. We present a deep learning building block for joint upsampling, namely guided filtering layer. This layer aims at efficiently generating the highresolution output given the corresponding low-re...
متن کاملEnd-to-End Trainable Attentive Decoder for Hierarchical Entity Classification
We address fine-grained entity classification and propose a novel attention-based recurrent neural network (RNN) encoderdecoder that generates paths in the type hierarchy and can be trained end-to-end. We show that our model performs better on fine-grained entity classification than prior work that relies on flat or local classifiers that do not directly model hierarchical structure.
متن کاملSupervised Hashing with End-to-End Binary Deep Neural Network
Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes. Our work proposes a new end-to-end deep network architecture for supervised hashing which directly learns binary codes from input images and maintains good properties over binary codes such as similarity preservation, independence, and balancing. Furthermore,...
متن کاملEnd-to-End Deep Neural Network for Automatic Speech Recognition
We investigate the efficacy of deep neural networks on speech recognition. Specifically, we implement an end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding. The system will comprise of two variants of neural networks for phoneme recognition. In particular, we utilize conv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Pattern Analysis and Machine Intelligence
سال: 2021
ISSN: ['1939-3539', '2160-9292', '0162-8828']
DOI: https://doi.org/10.1109/tpami.2019.2937086